222 research outputs found
A novel framework for assessing metadata quality in epidemiological and public health research settings
Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly
Can primary care electronic health records facilitate the prediction of early cognitive decline associated with dementia: a systematic literature review
Introduction Identifying the early stages of dementia is key in care management, clinical trial recruitment and mitigating the impact of cognitive impairment. At present, cognitive tests are most commonly used to investigate early stages of dementia and are often only conducted after initial symptoms of cognitive decline have been identified. There is potential to harness routinely collected data from electronic health records (EHR) to discover markers of early-stage dementia, both in its cognitive and non-cognitive manifestations. However, the extent to which primary care EHR can facilitate earlier diagnosis of dementia has not systematically been examined. We aim to determine the extent to which EHR can be utilized to identify prodromal dementia in primary care settings through a systematic review of the literature. Method We searched electronic medical databases (including Scopus, Web of Science, OvidSP, MEDLINE and PsychINFO) for potentially relevant studies up to and including September 2016 and written in English. We used the following MeSH search terms: âdementiaâ (including its subtypes), âelectronic health recordsâ (variations thereof) and âprimary careâ. Additionally, grey literature was searched including reports released by the government, councils and relevant major UK charities. Results We identified and reviewed 31 studies. In total 35 risk factors and 147 potential markers of early cognitive decline were identified. There was considerable variability across studies as to whether markers were classed as confounders, risk factors, early markers or co-morbidities. Markers predominantly fell within cognitive, affective, motor and autonomic symptoms, prescription patterns of both dementia and non-dementia medication and health system utilization, including type of consultation, frequency of contact and duration. Three studies investigated variation in the markersâ predictive strengths at different time points during the prodromal period of dementia. In the 24 months prior to diagnosis of dementia, gait disturbances, changes in weight, number of consultations, specialty referrals and hospital admissions showed the strongest strength of association with dementia diagnosis. Number of consultations, unpredictability in consulting patterns, such as âDid not attendâ, carer and social care involvement showed the strongest strength of association with dementia diagnosis during a longer prodromal period (up to 54 months). Discussion Tests which specifically investigate cognitive health, such as the Mini Mental State Exam (MMSE) exam, are often only conducted in the period of Mild Cognitive Impairment (MCI) preceding dementia diagnosis, once irremediable damage has occurred. In many cases, these symptoms are conflated with normal ageing, affective disorders, or attenuated by multimorbidities, and are therefore not directly linked to dementia. These results show that there is a broad range of potential markers which could be used to better define prodromal dementia, however very little literature has been published in this area. Conclusion There is significant potential to use routinely collected data from EHR to investigate and define prodromal dementia. The use of EHR allows us to obtain a more complete understanding of early-stage dementia according to its more commonly investigated cognitive signs, as well as non-cognitive presentations. Understanding the breadth and trajectories in prodromal dementia period will be key in facilitating earlier diagnosis
Classification of atherothrombotic events in myocardial infarctions survivors with supervised machine learning using data from an electronic health record system
The aim was to build a prediction model for subsequent atherothrombotic
events for patients who survived a myocardial infarction. The dataset contained
7,582 patients from a national Electronic Health Record. The prediction is a binary
outcome (event and no event) in a period of five years after a myocardial infarction.
Different classifiers were tested and XGBoost achieved the best F1-score=0.76. Top
features are: imd_score, age_at_entry, egfr_ckdepi_base, height, and SBP_base
Discovering and validating disease subtypes for heart failure using unsupervised machine learning methods
Notable heterogeneity exists in the clinical presentation of heart failure (HF) patients. Current subtype classifications are based on ejection fraction may not fully capture the aetiological and prognostic heterogeneity of HF. The use of unsupervised machine learning (ML) approaches, such as cluster analysis, on large-scale observational data from electronic health records (EHR), can enable the discovery of novel subtypes and guide the characterization of their clinical manifestation. Clustering methods can group HF patients based on similarities between their clinical features without making a priori assumptions about the distribution of the data. We sought to discover, characterize and replicate HF subtypes by applying a clustering method on a heterogeneous HF population derived from phenotypically rich EHR. Characterization of HF subtypes using EHR derived variable may enable more precise large-scale genomic analysis to inform better prevention, diagnostic and treatment strategies
Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning
BACKGROUND: Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS: We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS: We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION: Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR
Neutrophil Counts and Initial Presentation of 12 Cardiovascular Diseases: A CALIBER Cohort Study
BACKGROUND: Neutrophil counts are a ubiquitous measure of inflammation, but previous studies on their association with cardiovascular disease (CVD) were limited by small numbers of patients or a narrow range of endpoints. OBJECTIVES: This study investigated associations of clinically recorded neutrophil counts with initial presentation for a range of CVDs. METHODS: We used linked primary care, hospitalization, disease registry, and mortality data in England. We included people 30 years or older with complete blood counts performed in usual clinical care and no history of CVD. We used Cox models to estimate cause-specific hazard ratios (HRs) for 12 CVDs, adjusted for cardiovascular risk factors and acute conditions affecting neutrophil counts (such as infections and cancer). RESULTS: Among 775,231 individuals in the cohort, 154,179 had complete blood counts performed under acute conditions and 621,052 when they were stable. Over a median 3.8 years of follow-up, 55,004 individuals developed CVD. Adjusted HRs comparing neutrophil counts 6 to 7 versus 2 to 3Â Ă 10(9)/l (both within the 'normal' range) showed strong associations with heart failure (HR: 2.04; 95% confidence interval [CI]: 1.82 to 2.29), peripheral arterial disease (HR: 1.95; 95% CI: 1.72Â to 2.21), unheralded coronary death (HR: 1.78; 95% CI: 1.51 to 2.10), abdominal aortic aneurysm (HR: 1.72; 95% CI: 1.34 to 2.21), and nonfatal myocardial infarction (HR: 1.58; 95% CI: 1.42 to 1.76). These associations were linear, with greater risk even among individuals with neutrophil counts of 3 to 4 versus 2 to 3Â Ă 10(9)/l. There was a weak association with ischemic stroke (HR: 1.36; 95% CI: 1.17 to 1.57), but no association with stable angina or intracerebral hemorrhage. CONCLUSIONS: Neutrophil counts were strongly associated with the incidence of some CVDs, but not others, even within the normal range, consistent with underlying disease mechanisms differing across CVDs. (White Blood Cell Counts and Onset of Cardiovascular Diseases: a CALIBER Study [CALIBER]; NCT02014610)
White cell count in the normal range and short-term and long-term mortality: international comparisons of electronic health record cohorts in England and New Zealand
OBJECTIVES: Electronic health records offer the opportunity to discover new clinical implications for established blood tests, but international comparisons have been lacking. We tested the association of total white cell count (WBC) with all-cause mortality in England and New Zealand. SETTING: Primary care practices in England (ClinicAl research using LInked Bespoke studies and Electronic health Records (CALIBER)) and New Zealand (PREDICT). DESIGN: Analysis of linked electronic health record data sets: CALIBER (primary care, hospitalisation, mortality and acute coronary syndrome registry) and PREDICT (cardiovascular risk assessments in primary care, hospitalisations, mortality, dispensed medication and laboratory results). PARTICIPANTS: People aged 30-75â
years with no prior cardiovascular disease (CALIBER: N=686â
475, 92.0% white; PREDICT: N=194â
513, 53.5% European, 14.7% Pacific, 13.4% Maori), followed until death, transfer out of practice (in CALIBER) or study end. PRIMARY OUTCOME MEASURE: HRs for mortality were estimated using Cox models adjusted for age, sex, smoking, diabetes, systolic blood pressure, ethnicity and total:high-density lipoprotein (HDL) cholesterol ratio. RESULTS: We found 'J'-shaped associations between WBC and mortality; the second quintile was associated with lowest risk in both cohorts. High WBC within the reference range (8.65-10.05Ă10(9)/L) was associated with significantly increased mortality compared to the middle quintile (6.25-7.25Ă10(9)/L); adjusted HR 1.51 (95% CI 1.43 to 1.59) in CALIBER and 1.33 (95% CI 1.06 to 1.65) in PREDICT. WBC outside the reference range was associated with even greater mortality. The association was stronger over the first 6â
months of follow-up, but similar across ethnic groups. CONCLUSIONS: Clinically recorded WBC within the range considered 'normal' is associated with mortality in ethnically different populations from two countries, particularly within the first 6â
months. Large-scale international comparisons of electronic health record cohorts might yield new insights from widely performed clinical tests. TRIAL REGISTRATION NUMBER: NCT02014610
Low eosinophil and low lymphocyte counts and the incidence of 12 cardiovascular diseases: a CALIBER cohort study
BACKGROUND: Eosinophil and lymphocyte counts are commonly performed in clinical practice. Previous studies provide conflicting evidence of association with cardiovascular diseases. METHODS: We used linked primary care, hospitalisation, disease registry and mortality data in England (the CALIBER (CArdiovascular disease research using LInked Bespoke studies and Electronic health Records) programme). We included people aged 30 or older without cardiovascular disease at baseline, and used Cox models to estimate cause-specific HRs for the association of eosinophil or lymphocyte counts with the first occurrence of cardiovascular disease. RESULTS: The cohort comprised 775â
231 individuals, of whom 55â
004 presented with cardiovascular disease over median follow-up 3.8â
years. Over the first 6â
months, there was a strong association of low eosinophil counts (<0.05 compared with 0.15-0.25Ă10(9)/L) with heart failure (adjusted HR 2.05; 95% CI 1.72 to 2.43), unheralded coronary death (HR 1.94, 95% CI 1.40 to 2.69), ventricular arrhythmia/sudden cardiac death and subarachnoid haemorrhage, but not angina, non-fatal myocardial infarction, transient ischaemic attack, ischaemic stroke, haemorrhagic stroke, subarachnoid haemorrhage or abdominal aortic aneurysm. Low eosinophil count was inversely associated with peripheral arterial disease (HR 0.63, 95% CI 0.44 to 0.89). There were similar associations with low lymphocyte counts (<1.45 vs 1.85-2.15Ă10(9)/L); adjusted HR over the first 6â
months for heart failure was 2.25 (95% CI 1.90 to 2.67). Associations beyond the first 6â
months were weaker. CONCLUSIONS: Low eosinophil counts and low lymphocyte counts in the general population are associated with increased short-term incidence of heart failure and coronary death. TRIAL REGISTRATION NUMBER: NCT02014610; results
Analyzing the heterogeneity of rule-based EHR phenotyping algorithms in CALIBER and the UK Biobank
Electronic Health Records (EHR) are data
generated during routine interactions across
healthcare settings and contain rich, longitudinal
information on diagnoses, symptoms, medications,
investigations and tests. A primary use-case for
EHR is the creation of phenotyping algorithms
used to identify disease status, onset and
progression or extraction of information on risk
factors or biomarkers. Phenotyping however is
challenging since EHR are collected for different
purposes, have variable data quality and often
require significant harmonization. While
considerable effort goes into the phenotyping
process, no consistent methodology for
representing algorithms exists in the UK. Creating
a national repository of curated algorithms can
potentially enable algorithm dissemination and
reuse by the wider community. A critical first step
is the creation of a robust minimum information
standard for phenotyping algorithm components
(metadata, implementation logic, validation
evidence) which involves identifying and
reviewing the complexity and heterogeneity of
current UK EHR algorithms. In this study, we
analyzed all available EHR phenotyping algorithms
(n=70) from two large-scale contemporary EHR
resources in the UK (CALIBER and UK Biobank).
We documented EHR sources, controlled clinical
terminologies, evidence of algorithm validation,
representation and implementation logic patterns.
Understanding the heterogeneity of UK EHR
algorithms and identifying common implementation patterns will facilitate the design of
a minimum information standard for representing
and curating algorithms nationally and
internationally
Selective recruitment designs for improving observational studies using electronic health records
Largeâscale electronic health records (EHRs) present an opportunity to quickly identify suitable individuals in order to directly invite them to participate in an observational study. EHRs can contain data from millions of individuals, raising the question of how to optimally select a cohort of size n from a larger pool of size N . In this article, we propose a simple selective recruitment protocol that selects a cohort in which covariates of interest tend to have a uniform distribution. We show that selectively recruited cohorts potentially offer greater statistical power and more accurate parameter estimates than randomly selected cohorts. Our protocol can be applied to studies with multiple categorical and continuous covariates. We apply our protocol to a numerically simulated prospective observational study using an EHR database of stable acute coronary disease patients from 82â089 individuals in the U.K. Selective recruitment designs require a smaller sample size, leading to more efficient and costâeffective studies
- âŠ